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PRESERVEMETHOD N.bazO WHEN LIVEMETHOD L.barl) 



N 



03/17/2004, EAST Version: 1.4.1 



U.S. Patent Apr. 8, 2003 Sheet 2 of 4 US 6,546,551 Bl 



FIG. 3 

/\ v 

import java.lang.Class; 
import java. lang. reflect.*; 

public class L { 
static void foo() { 
try { 

Class c = Class. forName(T) 5 
H m = (M)c.newInstanceO : 
Field fid = c.getField('x') , 
fld.setlnt (m. 10) , 

} 

catch (Exception e){ ... ) 



static void bar() { 
try { 

Class c = Class. forNameCN"); 

N n = (N)c.newInstanceO ,• 

Method method = c . getMethod ( "baz " , new Class[0]) ; 

method. invoke(n, new Object[0]) : 

catch (Exception e){ ... ) 

class M { 
public int x : 



class N ( 
public void baz() ( ... } 
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FIG. 5 

/\ k 

\ 

import java.lang.Class; 
import java. lang. reflect.*: 

public class A { 
public static void main(String args[J|{ 

LfooU, 

Class els = Class. forName(T) ; 

B b = (B)cls.newInstanceO 

Method method = cls.getMethodCzip", new Class[0]) ; 

method. invoketn. new Object! 01) s 

C c « new CO >, 
c.zapd ; 

} 

)-. 

class B { 
void zip () { ... h 



class C { 
void zap (){ ... ); 
void unused (K ... h 

); 



FIG. 6 



IMPORT library. co nf 
PRESERVECLASS B 
PRESERVEMETHOO B.zipO 
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FIG. 7 

/\ v 

import java.lang.ClasS; 
import java.lang. reflect.*-, 

public class L { 
static void foot) { 
try { 

Class c * Class. forName('H') ■, 
H n = (H)c.newInstanceO : 
Field fid ■ c.getFieldCx') , 
fld.setlntdn, 10); 

} 

catch (Exception e){ ... } 

}; 

class H { 
public int x : 

}; 

public class A { 
public static void mainlString args[ ]) { 

Lfood, 

Class els = Class. forName('B') ■, 

B b = (B)cls.newlnstance(l j 

Method method « cls.getMethodCzip", new ClasstOl); 

method. invokG(n. new Object(Ol); 

C c » new CO; 
c.zapl) ; 

}; 

class B { 
void zip 0 < ... }; 

)i 

class C { 
void zap (){ ... } ; 
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METHOD FOR ACCURATELY EXTRACTING 
LIBRARY-BASED OBJECT-ORIENTED 
APPLICATIONS 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention generally relates to object-oriented 
programming (OOP), and in particular to OOP systems 
supporting the C++ and Java™ programming languages. 

2, Description of the Related Art 

Object-oriented programming languages provide a num- 
ber of features such as classes, inheritance, and virtual 
methods. These object-oriented features have several advan- 
tages. In particular, they enable the creation of class libraries 
that can be reused by many different applications, in many 
different contexts. 

Class libraries are usually distributed separately from 
applications that use them. A disadvantage of this traditional 
distribution model is that the shipped class libraries can be 
very large, and hence require large amounts of space to store 
them, and large amounts of time to download them. In cases 
where an application only uses a small part of a class 
library's functionality, distribution of the entire library is 
often undesirable, because the user effectively pays a penalty 
for unused library features. A more detailed description of 
such problems is set forth in Tip et al., "Practical experience 
with an application extractor for java™," In Proceedings of 
the Fourteenth Annual Conference on Object-Oriented Pro- 
gramming Systems, Languages, and Applications 
(OOPSLA'99) (Denver, Colo., 1999), herein incorporated 
by reference in its entirety. 

To address this problem, application extraction tools have 
been designed and implemented. Such tools are discussed in 
Agesen et al., "Sifting out the gold: Delivering compact 
applications from an exploratory object-oriented program- 
ming environment," In Proceedings of the Ninth Annual 
Conference on Object-Oriented Programming Systetns, 
Languages, and Applications (00 PS LA* 94) (Portland, 
Oreg., 1994), ACM SIGPLAN Notices 29(10), pp. 355-370; 
Agesen, "Concrete Type Inference: Delivering Object- 
Oriented Applications," Sun Microsystems Laboratories 
Technical Report SMLI TR-96-52, December 1995; Tip et 
al., "Practical experience with an application extractor for 
java™," In Proceedings of the Fourteenth Annual Confer- 
ence on Object-Oriented Programming Systems, 
Languages, and Applications (OOPSLA*99) (Denver, Colo., 
1999); IBM Smalltalk User's Guide, version 3, release 0 ed., 
IBM Corp, 1995, Chapters 36-38; Smalltalk/V for Win32 
Programming, Digitalk Inc., 1993, Chapter 17; and Parc- 
Place Smalltalk, objectworks release 4.1 ed., 1992, Sections 
16, 28; herein incorporated by reference in their entirety. 
Sucrr tools can perform a static whole-program arialysisof 
the application along with the libraries tharirder3ends on to 
determine the parts of the library and the application that are 
used. Subsequently,Cprogram-transformations and optimiza- 
<tioS are performecTthat eliminate the unused functionality 
of the application and the library, thereby reducing both 
application size and download time. 

Modern object-oriented programming environments such 
as the Java™ platform load object oriented programs 
dynamically, create object instances dynamically when they 
are needed, and link such object instances dynamically for 
execution. In addition, such platforms typically include a 
reflection mechanism by which an object-oriented program 
can fetch information about a class of objects, or access 
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program components by specifying their name. For 
example, a program can inquire about thejiame of a class 
associated with an object reference, oQheTnumberof meuQ 
oaVdefined inaclassT ^ Pyja^Moading^is another example 

5 of reflection. Here, the programmer instructs the platform to 
■load a class withT efsp ecified name CafteTw hich instances of ^ 
this^class can be created^For a more detailed description of " 
the Java™ reflectioFmechanism, see McManis, "Take an 
in-depth look at the Java™ Reflection API", http:// 

10 www.javaworld.com/jw-09-1997/jw-09-indepth.html, 
herein incorporated by reference in its entirety. The use of 
reflection poses a problem for application extraction tools 
because a static analysis alone cannot determine which 
classes are instantiated using reflection, and which methods 

15 are invoked using reflection. Without this information, a safe 
approximation of the application's call~graph cannot be 
constructed; and without afsaf e~call graph) it is unclear 
which methods are unused so that extraction of the appli- 
cation is impossible. 

20 In order to handle applications that use reflection, or 
applications that uses class libraries in which reflection is 
used, application extraction tools require additional infor- 
mation from the user. In current application extraction tools, 
this information takes the form of a list of the classes, 

25 methods, and fields in an application that are accessed using 
reflection. This information is then used to construct a safe 
approximation of the call graph, and the application can be 
extracted safely. The drawbacks of this approach have to do 
with the fact that the set of program components accessed 

30 using reflection in a class library depend on the library 
features used by an application. Hence, if a user wants to 
extract multiple applications with regard to the same library, 
he is faced with two options: 
(1) Construct a global list of program components in the 

35 library that may be accessed anywhere in the class 
library using reflection. This list can safely be used for 
extracting any applications with respect to the library; 
or 

40 (2) For a given application that is to be extracted with 
respect to the library, construct a list of program 
components in the library that are accessed using 
reflection in the parts of the library used by that 
application. 

45 Option (1) has the advantage that only a single "configu- 
ration file" needs to be written for the library, but it has the 
disadvantage of being overly conservative: the extracted 
applications may contain parts of the library that they do not 
use. 

50 Option (2) has the advantage that each application is 
extracted only with the parts of the library that it uses, but 
it has the disadvantage that a separate configuration file is 
required for each application. 
Therefore, there is a need in the art to provide a mecha- 

55 nism for accurately and efficiently extracting object-oriented 
components of a library that are potentially used in the 
execution of multiple applications. 

SUMMARY OF THE INVENTION 

60 The problems presented above and the related problems 
of the prior art are solved by the present invention, method, 
and apparatus for accurately extracting library-based object- 
oriented applications. The present invention is capable of 
accurately extracti ng multiple applications with respect to a 

65 class library. The invention relies on a single configuration 
me tor the library, which describes how program compo- 
nents in the library should be preserved under specified 
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conditions. The invention may be used in appUc^tio^extrac- 
tion tools, and in tools that_ ajjm_at e^haBcing^performancei 
using^h^le^progrS^Si™^ 3 ^ 011 ^- 

The" invention" niay be usee! as an optimizatioh"to" reduce ^ 
application size by eluTainatin^unreachable methods. IiTthe^ 5 
alternative, the invention may be used as-a-basis-foropti-^ 
mizations that reduce e1cecution~time (e.g.f by means 'of cair 
devitalization), and as a basis for tools for program 
understanding and debugging. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating a data processing 
system on which the subject invention may be implemented. 

FIG. 2 shows a schematic overview of the design of an 15 
application extractor. 

FIG. 3 shows an example of a class library that uses 
several reflection features. 

FIG. 4 shows a configuration file for the class library of 
FIG. 3. 20 

FIG. 5 shows an example of an application that uses the 
class library of FIG. 3. 

FIG. 6 shows a configuration file appl.conf for the appli- 
cation of FIG. 5. 25 

FIG. 7 shows a source-level view of the application of 
FIG. 5 as produced by the application extractor of the 
present invention. 

DETAILED DESCRIPTION OF THE 30 
INVENTION 

Referring now to the drawings, and more particularly to 
FIG. 1, there is shown a representative data processing 
apparatus on which the subject invention may be imple- 
mented. The computer processing apparatus includes 35 
memory 101 and a central processing unit (CPU) 103. (Hie} 
fSemory 101 typically includes main memory and cache 
memory for storing instructions to be executed by the CPU 
103 and data to be used in the execution of such instructions. 

40 

The CPU 103 is attached via system bus 112 to user 
interface adapter 107. Typically, the user interface adapter 
107 has attached to it a keyboard, a mouse, and/or other user 
interface, in addition, a display device 105 (such as a 
cathode ray tube display or a liquid crystal display) is 45 
connected to the system bus 112 via a display adapter 104. 

The computer system's operating system (and other 
utilities), application-progranrcode andjdata [are stored~in 
persistent memory and temporarily^ loaded into memory 101 
for execution by the CPU 103. The persistent memory is 50 
typically provided by a disk drive 108 coupled to the CPU 
via system bus 112. In addition, persistent memory may be 
provided by remote resources coupled to the CPU 103 via 
the system bus 112 and a communication link 109. In this 
case, portions of the computer system's operating system (or 55 
other utilities), and portions of the application program code 
and data may be retrieved from remote resources via the 
communication link 109 and loaded into memory 101 for 
execution by the CPU 103. The methodology of the present 
invention as described below is preferably implemented as 60 
application program code that is stored in persistent memory 
(or retrieved from remote resources) and loaded into 
memory 101 for execution by the CPU 103. 

In order to better understand the invention, some back- 
ground material is presented regarding the notions of class 65 
hierarchies and virtual method dispatch in object-oriented 
programming languages. The example programs discussed 
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in this document are written in the Java programming 
language. For a definition of the Java language, refer to 
James Gosling, Bill Joy, and Guy Steele, "The Java Lan- 
guage Specification", Addison-Wesley, 1996, herein incor- 
porated by reference in its entirety. 

The following aspects of class hierarchies are relevant for 
the present invention: 
A class hierarchy contains a set of classes] Note that in 
some languages (e^g7~Java) the~term J interface or 
abstract class is used to refer to a class whose func- 
tionality is restricted. Thus, a Java interface may be 
viewed as a class that cannot be instantiated, and which 
only specifies the signatures of the methods that it 
contains. Java interfaces can be treated as classes for 
the purposes of the method described in this document. 
'EaclTclass" in" the" hierarchy~contains„ a seLof members^! 
which includes a set of (virtual) methods and fields to 
be included in objects that are instances of the class. 
A class hierarchy contains a set of mheritance^relations 
between classes. ATclass can extend thTKncyonaTu^L^ 
anotherchtss "by^^rivin^fronrit.^The former class is 
referred toas the subclass or "derived class, whereas the 
latter class is known as the superclass or base class. 
(Note that some languages, e.g., C++, allow a class to 
have multiple base classes). 
Application Extractor 

FIG. 2 shows a schematic overview of an application 
extractor in which the present invention is incorporated. The 
application extractor consist of the following components: 
A'loader which l c^ds~l hTfileT.thlitl^mprise the jipplica - 
tioiTand its libraries, and constructs an internal repre- 
sentation^ the applicaticm. In the Java™ platform, the 
flo'a~der loads class file^]haT^omprise the^application 
and its libraries. N 
A call graph builder which analyzes the internal 
representation, and constructs a safe approximation of 
the application's call graph, wWch<ide^tMes~all^ethy] 
<^s~that~may ~be^ potelitially^cecuTeo^wh^n^the appli- 
cati6iris~run~(alsb ^ referrFdl6~hercin as "live"). Any 
method that does not occur in the call graph is definitely 
not executed when the application is run. In construct- 
ing the application's call graph, the call graph builder 
uses a configuration file, the details of which are set 
forth below. 

^ter'corjstruction~of~the"call"-graphr an optimizer ^ 
\prejerably used^o-remove -unreachable methods and" 
useless fields (See Sweeney^ "PTF.^ and Tip, F, "A study 
of dead data members in C++ applications," In Pro- 
ceedings of the ACM SIGPIAN'9& Conference on 
Programming Language Design and Implementation 
(Montreal, Canada, June 1998), ACM SIGPLAN 
Notices 33(6), pp. 324-332, herein incorporated by 
reference in itsentirety). tln^dditi61y the~optimizer may^ 
. perform optimizations to the methods that remainrSucti 
^optimizations'may^ 
archy^transformations, (See Tip et al., "Slicing class 
Hierarchies in C++," In Proceedings of the Eleventh 
Annual Conference on Object-Oriented Programming 
Systems, Languages, and Applications (OOPS LA' 96) 
(San Jose, Calif., 1996), ACM SIGPLAN Notices 
31(10), pp. 179-197; and Tip, F, and Sweeney, P., 
"Class hierarchy specialization," In Proceedings of the 
Twelfth Annual Conference on Object-Oriented Pro- 
gramming Systems, Languages, and Applications 
(OOPSLA'97) (Atlanta, Ga., 1997), ACM SIGPLAN 
Notices 32(10), pp. 271-285; herein incorporated by 
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reference in their entirety), ca ll dev italization^ and specified by a directive preferably includes one"or more' 

traditional compiler optimizations sucfi asn^ommon ^classes"6f objects,1ind/or methods and fields to be'included 

subexpression elimination, constant propagation, and m - objects thari^^ 

strength reduction (See V. Aho et al., "Compilers Unconditional directives^refel^lylaice one~of the follow- 

Principles, Techniques and Tools," Addison-Wesley, s ing forms: 

1986; and Zima, H. and Chapman, B., "Supercompilers PRESERVECLASS <className> 

for Parallel and Vector Computers," ACM Press, New nnrrrm/rwr-niAn jm 
v . in01 . . . j i c . t . . PRESERVEMETHOD <methodName> 

York, 1991; herein incorporated by reference in their m^t^t r* 

entirety). Another o P amiz^tion perfonned by tKis-com-A PRESERVEFIELD <fieldName> 

ponent -may be the renaming of classes, methods, and- 1 0 ^ semantics of these exemplary unconditional directives 

(fields (See Tip et al., "Practical experience with an" are as f ollows * 

application extractor for java™," In Proceedings of the ^ class C ^ted lQ the configuration file using the 

Fourteenth Annual Conference on Object-Oriented PRESERVECLASS directive is assumed to be instantiated, 

Programming Systems, Languages, and Applications and the application extractor will not make assumptions 

(OOPSLA'99) (Denver, Colo., 1999), incorporated by is about where ob i ects of X W* C m ^ occur in the a PP"cation. 

reference above in its entirety). Furthermore, the application extractor will assume that the 

n;„*n„ * ™™~™t ^cot- rt f-fiis^^n identity of class C must be preserved. This implies that C 

rinally, a writer component produces a'set ot nles com-t ' . f . . , . 

^ xv^^rfliK. m< ,„ k« \ cannot be removed, renamed or merged into another class, 

prisms the optimized application. These files may be a . , , ' , . . „ & . „. 

_ r „ ; n1 ^ m ^;, trt Any method m listed in the configuration file using the 

source representation or the program, an intermediate- wr , 4 ^^^^ & & , 

level representation of the progrfm, or a representation 20 PRESERVEMETHOD d.rect.ve ts assumed to be executed, 

suitable for execution on a particular platform. For a " d the apphcauon extractor wUl not make assumpt.ons 

example, the writer component may produce files suit- ibo »l where melhod m , 15 callec ! fr ° ra - P"«hermore, the 

able for execution on the Java™ platform (in which apphcation "tractor will assume that the tdent.ty of method 

™^ rk^ x,^ _ „ f nf \ Ucc m must be preserved. This implies that method m cannot be 

case the writer component produces a set or class files , r t c , ^ 

, ui r i tm i „ removed or renamed, and that the identity of any class C 

that are suitable for execution on a Java™ virtual 25- , . * , * 

machine). An example of such a class writer compo- Kf ^ eac f^ m , S Sl f ? mm « 6 preserve c d , as wel1 - u 

nent is described in Vallee-Rai et al., "Soot-a Java m ^L^JjTt m 6 confi S uraU ° n u file "™8 th « 

Bytecode Optimization Framework," Proceedings of PRESERVEFIELD directive , « assumed to be accessed and 

CASCON'99, Nov. 8-11, 1999, Missisauga, Ontario, tbe u a PPj* aU ° n e««ctor w dl not make assumptions about 

Canada, available from http://www.sable.mcgill.ca/ 30 where field f.s accessed from. Furthermore, the apphcat.on 

publications/#cascon99 as of the filing of the present extractor w.ll assume that the identity of field f must be 

. ( . u • c ■ •* preserved. This implies that f cannot be removed or 

invention, herein incorporated by reference in its r , , , , r . , r . ^ _ , . 

entirety renamed, and that the identity of any class C referenced in 

The constitutions and functions of these elements are well fs type must be preserved as weU 
known in the art and will not be otherwise described here. 35 Conditional directives specifies: i) a condition associated 

For example, further description on the various functionality with;a first set^of components of the program (e.g., the first 

f t , ., V, u , . u. c a set of components of the program are determined to be 

of the tool and/or compiler described above may be found in . « .1 . . \ . -v 

A. V. Aho et al., "Compilers Principles, Techniques and P^entially executable when the program is run); and 11) a 

Tools, Addison-Wesley, 1986, incorporated by reference C^d set of components of the program that are to be 

above in its entirety 40 cIassified as potentially executable in the event that the 

Several application extractors have been described in the condit f io u n * satisfied - Conditional directives preferably take 

literature. Such tools are discussed in Agesen et al., "Sifting one of lhe followin g forms: 

out the gold: Delivering compact applications from an PRESERVECLASS <className> WHEN 

exploratory object-oriented programming environment," In LIVEMETHOD <methodName> 

Proceedings of the Ninth Annual Conference on Object- 45 PRESERVEMETHOD <methodNamel> WHEN 

Oriented Programming Systems, Languages, and Applica- LIVEMETHOD <methodName2> 

tions (OOPSLA'94) (Portland, Oreg., 1994), ACM SIG- PRESERVEFIELD <fieldName> WHEN 

PLAN Notices 29(10), pp. 355-370; Agesen, "Concrete Type LIVEMETHOD <methodName> 

Inference: Delivering Object-Oriented Applications," Sun The semantics of these exemplary conditional directives are 

Microsystems Laboratories Technical Report SMLI TR-96- 50 as follows. 

52, December 1995; Tip et al., "Practical experience with an Any class C listed in the configuration file using a 

application extractor for java™," In Proceedings of the conditional directive of the form PRESERVECLASS C 

Fourteenth Annual Conference on Object-Oriented Pro- WHEN LIVEMETHOD m directive is assumed to be instan- 

gramming Systems, Languages, and Applications liated if method m is determined to be live. In this case, the 

(OOPSLA'99) (Denver, Colo., 1999); IBM Smalltalk User's 55 application extractor will not make assumptions about where 

Guide, version 3, release 0 ed., IBM Corp, 1995, Chapters objects of type C may occur in the application, and the 

36-38; Smalltalk/V for Win32 Programming, Digitalk Inc., appjkahon^tractor will [ assume that the identity of class C 

1993, Chapter 17; and ParcPlace Smalltalk, objectworks must bc presejyoTThis implies that C cannot be removed, 

release 4.1 ed., 1992, Sections 16, 28; incorporated by relmmed"of^ nerged into another class, 
reference above in their entirety. 60 Any method m listed in the configuration file using a 

^ConfiguralioTFiles2} conditional directive of the form PRESERVEMETHOD m 

The configuration files used by the application extractor WHEN LIVEMETHOD n directive is assumed to be 

may include unconditional and conditional directives. executed if method n is determined to be live. In this event, 

Unconditional directives specify a set of components of a the application extractor will not make assumptions about 

program that are to be classified as potentially executable 65 where method m is called from. Furthermore, it will be 

unconditionally. In an object-oriented programming plat- assumed that the identity of jnethod m jr^srte^preserved. 

form such as Java™, the components of a program that are This implies that m cannot be removed or renamed, andlhat 
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the identity of any class C referenced in m's signature must 
be preserved as well. 

Any field f listed in the configuration file using a condi- 
tional directive of the form PRESERVER ELD f WHEN 
LIVEMETHOD m directive is assumed to be accessed if 
method m is determined to be live. In this case, it will be 
assumed that the identity of field f must be preserved. This 
implies that f cannot be removed or renamed, and that if the 
type of f is a class C, then Cs identity must be preserved as 
well. 

The IMPORT directive serves to enable modular compo- 
sition of configuration files, and takes the form: IMPORT 
<fileName>, where <filename> is the name of another 
configuration file. This has the same effect of adding the 
contents of <filename> to the configuration file containing 
the IMPORT directive. 
Use of Configuration Files by the Call Graph Builder 

The call graph builder preferably operates as follows. The 
operation starts with a set of "root" methods that are either]; 
directly invoked by the user^or by the run -time systenr(as 
is for example the case for applets)". Then, the following 
steps are repeatedly performed: 

(1) The body of a reached method is analyzed and 
information is collected. Although different call graph 
construction algorithm algorithms determine different 25 
amounts of information, all such algorithms identify a 
set of call sites in the method. 

(2) For ~a~given~ call- site,- a-set^of target"mettiods Tis^ 
deteTminedrA "target-method" denotes a method that is 
reachable from a virtual call site via a dynamic dis- 
patch. Different algorithms use a varying amount of 
information to do this. For example, class hierarchy 
analysis only uses the class hierarchy, and RTA also 
uses instantiated class information. Target methods that 
have not been analyzed yet using step (1) need to be 35 
scanned. Some algorithms reanalyze a method as more 
detailed information becomes available. 

The present invention is preferably integrated into the 
framework described above as follows. As a method is 
determined to be reachable, the configuration file is searched 
for "matching" conditions. If a conditional rule of the form 
"PRESERVEMETHOD m WHEN LIVEMETHOD n", 
method m will be analyzed using step (1). Note that this may 
give rise to the identification of additional call sites in step 
(2). 

In addition, the conditional rules of the form: 
PRESERVECLASS c WHEN LIVEMETHOD m 
PRESERVEFIELD f WHEN LIVEMETHOD m 
are useful for algorithms that rely on additional information 
to determine targets of virtual call sites. For example, RTA 50 
keeps track of the classes that are instantiated in the appli- 
cation in order to approximate the target methods that can be 
reached from virtual method calls. Any class that is found to 
be instantiated using a "PRESERVECLASS c WHEN 
LIVEMETHOD m" condition has to be taken into account. 
Any call-graph construction algorithm can be adapted to 
accommodate the configuration file directives described 
above. As an example, we will give a high-level overview 
how the RTA algorithm can be adapted to take into account 
the exemplary conditional and unconditional directives set 
forth above. The RTA algorithm is described in detail in 
Bacon, D. F, "Fast and Effective Optimization of Statically 
Typed Object-Oriented Languages", Computer Science 
Division, University of California, Berkeley, Report No. 
UCB/CSD-98-1017, December 1997, and Bacon, D. F., and 
Sweeney, P. F., "Fast static analysis of C++ virtual function 
calls," In Proceedings of the Eleventh Annual Conference on 
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45 
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Object-Oriented Programming Systems, Languages, and 
Applications (OOPSLA'96) (San Jose, Calif., 1996), SIG- 
PLAN Notices 31(10), pp. 324-341, herein incorporated by 
reference in their entirety. We will describe an adaptation of 
RTA that simultaneously computes a safe approximation of 
the set of accessed fields as the call graph is constructed, 
along the lines described in Sweeney, P. F, and Tip, F, "A 
study of dead data members in C++ applications," In Pro- 
ceedings of the ACM SIGPLAN '98 Conference on Program- 
ming Language Design and Implementation (Montreal, 
Canada, June 1998), ACM SIGPLAN Notices 33(6), pp. 
324-332 (although we will not make the distinction between 
read-access and write-access to fields here). Other algo- 
rithms that can be similarly adapted include class hierarchy 
analysis (see Dean et al., "Optimization of object-oriented 
programs using static class hierarchy analysis," In Proceed- 
ings of the Ninth European Conference on Object-Oriented 
Programming (ECOOP '95) (Aarhus, Denmark, August 
1995), W. Olthoff, Ed., Springe r-Verlag, pp. 77-101, incor- 
porated by reference in its entirety), and type propagation - 
based algorithms (see Agesen, "Concrete Type Inference: 
Delivering Object-Oriented Applications," Sun Microsys- 
tems Laboratories Technical Report SMLI TR-96-52, 
December 1995, incorporated by reference in its entirety). 

RTA is an iterative algorithm that maintains the following 
information: 

data identifying a set M of methods to which virtual calls. 

have~beed~encounteredr^ — — . 

datadderitifying a set I"of cla sses r that may be instantiated 

duringexecution, 
data idejtifyjng_a ~se TR~of "me thogsMhat are reachable 

during execution, and 
data identifying a set F of fields that may be accessed 

during execution. 
RTA begins by performing a number of initializations. 

(a) t v he"data"ia^tifying'the"set"R-is initiali2ed to"identify 
(a~~set of initially reachable methods (such as an appli- 
c ation's main method), and the data indentifying sets I, 

M, and F are initialized to identify the empty set. 
Then, RTA repeatedly performs the following steps until 
no more elements are added to any of the sets M, I, R, and 
F: 

(b) Process a method in R that has not been processed 
before. This involves scanning the method's code for 
virtual calls to other methods, and for instantiations of 
classes. If a virtual call to a method m that does not yet 
occur in M is encountered, data identifying m is added 
to the data identifying set M. If a direct call to a method 
n is encountered that does not yet occur in R, data 
identifying the method n is added to the data identify- 
ing set R. If an instantiation of a class c is encountered 
that does not yet occur in I, data identifying class c is 
added to the data identifying set I. If an access to a field 
f that does not yet occur in F is encountered, data 
identifying field f is added to the data identifying set F. 

(c) Resolve a virtual call to a method m in M w.r.t. an 
instantiated class c in I. This involves an upward 
traversal of the class hierarchy starting at class c, until 
a method m' is found with the same signature as m. If 
data identifying method m* does not already occur in 
the data identifying set R, data identifying method m' is 
added to the data identifying set R. 

Steps (b) and (c) are performed repeatedly untiLnb new 
L classesare- added to sel-I r no~new methods are added to set 
R7no~new methods are added to set M; and no new fields are 
added to set F. 
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RTA only requires a few modifications to accommodate 
configuration files with conditional and unconditional direc- 
tives. Unconditional directives are processed as follows: 

(d) when the data identifying the set I of classes is 
initialized, for each class c that is listed in an uncon- 5 
ditional directive of the form PRESERVECLASS c in 
the configuration file, data identifying the class c is 
added to the data identifying the set of instantiated 
classes I. 

(e) when the data identifying the set R of methods is jq 
initialized, for each method m that is listed in an 
unconditional directive of the form PRE- 
SERVEMETHOD m, data identifying the method m is 
added to the data identifying the set of reachable 
methods R. 3 5 

(f) when the data identifying the set F of accessed fields 
is initialized, for each field f that is listed in an 
unconditional directive of the form PRESERVEFIELD 
f is added to the set of accessed fields F. 

Preferably, each of the steps (d)-(f) are performed once, 20 
immediately after performing the initialization of the sets in 
step (a). 

Conditional directives are processed as follows. 

(g) when data identifying a new method m (or m') is added 

to the set R in step (b) (or step (c)), a check is performed 25 
to determine whether the configuration file contains any 
conditional directive of the form PRESERVECLASS x 
when LIVEMETHOD m (or PRESERVECLASS x 
when LIVEMETHOD m'). In the event that the con- 
figuration file contains such a conditional directive, 30 
data identifying the class x is added to the data iden- 
tifying set L 

(h) when data identifying a new method m (or m') is added 
to the set R in step (b) (or step (c)), a check is performed 

to determine whether the configuration file contains any 35 
conditional directive of the form PRE- 
SERVEMETHOD m" when LIVEMETHOD m (or 
PRESERVEMETHOD m" when LIVEMETHOD m'). 
In the event that the configuration file contains such a 
conditional directive, data identifying the method m" is 40 
added to the data identifying set R. 

(i) when data identifying a new method m (or m 1 ) is added 
to the set R in step (b) (or step (c)), a check is performed 
to determine whether the configuration file contains any 
conditional directive of the form PRESERVEFIELD f 45 
when LIVEMETHOD m (or PRESERVEFIELD f 
when LIVEMETHOD m'). In the event that the con- 
figuration file contains such a conditional directive, 
data identifying the field f is added to the data identi- 
fying the set F. 50 

Preferably, steps (g)-(i) are performed repeatedly along 
with steps (b) and (c), as long as no new elements are added 
to the sets I, M, F, and R. 
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FIG. 3 shows a small class library in which reflection is 
used. The library consists of 3 classes: L, M and N. Class L 
has two methods: foo and bar. Method foo uses reflection to -? 
dynamically load class Mrcre ate an object of type M; and 
assign the value 'lO'Mo the integer field x of that object! 60 
Methodbar uses reflection to dynamically load class N, 
create an object of type N, and invoke the baz method on that 
object. For convenience, the term library class will hence- 
forth be used to refer to classes that occur in a class library, 
and the term application class will be used to referred to 65 
classes that are not part of a class library. It is assumed that 
library classes do not inherit from application classes. 



FIG. 4 shows a configuration file library, conf for the class 
library of FIG. 3. This configuration file specifies the fol- 
lowing: 

class M is instantiated and should be preserved if method 

L.foo ( ) is live, 
field M.x should be preserved if method L.foo ( ) is live, 
class N is instantiated and should be preserved if method 

L.bar ( ) is live, and 
method N. baz should be preserved if method L.bar ( ) is 

live. 

FIG^5 shows an~application_that.uses the library of FIG. 
3^0bserye that this application dynamically loads class B, 
that it uses reflection to create an instance of type B, and that 
it uses reflection to invoke method B.zip( ) on that B-object. 

FIG. 6 shows an exemplary configuration file appl.conf 
for this application. Observe that the configuration file 
library.conf is imported into this file, and that the class B and 
method B.zip ( ), which are accessed using reflection, are 
listed in unconditional directives as components that should 
be preserved. 

We will now briefly discuss how the^ca^grapfceonstrue 1 - 
tjon^roceedsjfor the example application of FIG. 5 using 
configuration file appl.conf. 

First, the set R of reached methods is initialized to 
{A.main( )}, and I, M, and F are initialized to the empty set 
{ }. Hence, the(iriitiaTvalues'of these sets are as follows: 

R={A.main( )}~ - - - - " 

I=U 
M-{} 

F={} 

Next, the instructions in the body of method A.main ( ) are 
scanned. Since this method contains a virtual call to method 
C.zap( ), a direct call to L.foo( ), and an instantiation of class 
C, then data identifying the"methoTdrCJzap( )ls~adaVo^toTfie 
{data identifying ; set Mrdata~identlfying the method L.foo( ) 
is~added to the data'identifying the set R, and data identi- 
fying the class C is added to the data identifying the set I. 
Hence, we have the following situation: 

R={A.main( ) , L.foo( )} 

HC} 

M={C.zap( )} 

Adding data identifying the method L.foo( ) to data 
identifying the set R triggers a check if there are any 
conditional directives dependent on this method. Since the 
imported configuration file library.conf contains the lines 

PRESERVECLASS M WHEN LIVEMETHOD L.foo( ), 
and 

PRESERVEFIELD M.x WHEN LIVEMETHOD L.foo( ) 
data identifying the class M is added to the data identifying 
the set R, and data identifying the field M.x is added to data 
identifying set F. 

Resolving the virtual call to C.zap( ) in M w.r.t. instan- 
tiated class C in I results in the identification of C.zap( ) as 
a reachable method. Hence, data identifying the method 
C.zap( ) is added to the data identifying the set R. Hence, we 
have the following situation: 

R-{A.main( ), L.foo( ), C.zap( )} 

HC M} 

M-{C.zap( )} 

Fo{M.x} 

After this, no new elements are added to the sets R, I, M, 
and F. FIG. 7 shows a source-level view of the application 
program produced by the application extractor. Observe the 
following components have been removed: 
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library class N, 

library method L.bar( ), and 

application method C. unused( ) 
Note that the representation of the application program 
produced by the application extractor need not be a source 5 
representation as shown in FIG. 7, but may be an interme- 
diate representation of the program, or a representation 
suitable for execution on a particular platform, such as the 
Java™ platform. 

The application extractor may subsequently perform pro- 30 
gram transformations and optimizations to the methods in 
the call graph, with the constraint that any classes, methods, 
and fields listed in an unconditional or conditional PRE- 
SERVE directive should retain their identity. Specifically, 
classes M and B, field M.x, and method B.zip( ) should 15 
retain their identity, and should not be renamed or trans- 
formed. 

The incorporation of "conditionally preserve" directives 
in configuration files has the advantage that one can write a 
single configuration file for a class library, which can be 2 o 
shared by multiple applications that use that library. 
Moreover, each application can be extracted precisely, in a 
way that only extracts the library functionality used by that 
application. 

This enables a distribution model for class libraries in 2 s 
which the vendor of a library L supplies a configuration file 
F with unconditional and/or conditional directives for 
classes, methods, and fields that are accessed in L, prefer- 
ably via a reflection mechanism in L. In'order to extract an 
L-based application A, the user augments F with additional 30 
directives for A, and uses the resulting configuration file F* 
to precisely extract the used components of L and A. 

The advantage of this distribution model is that the author 
of the class library (who is familiar with the use of reflection 
in the library) writes the configuration file for the library, and 35 
that the end-user can precisely extract the library compo- 
nents required by his application, without having detailed 
knowledge of the library code. 

The technique of the present invention may also be 
integrated into an application for program understanding and 40 
debugging. More specifically, the methodology described 
above may be applied to an input program that uses methods 
in a class library to generate data that identifies components 
of the input program and class library that are potentially 
used in the execution of the input progr^^asedjiponjhe^ 45 
conditional directive. ^e~identifi"e^~cpn^bncnt(s).are4hen' 
reported-tothe^^erytejte^^ or other user 

interface~device for program understanding purposes or 
debugging purposes. 

While the invention has been described above with 50 
respect to particular embodiments thereof, those skilled in 
the art will recognize that the invention can be practiced with 
modification within the spirit and scope of the appended 
claims. 

We claim: 55 

1. < A~me thod~for"analyzing ^^object-oriented program 

implementing reflection comprising plurality jyfy* 

t^omfonents, the method comprising the steps~of: ~ 

providing data including at least one conditional directive, 
wherein the conditional directive specifies i) a-condir 60 
t^^a^ociaTe^A^^^rst set of components; and ii) a 
second set of componemsnhlTalenb^b^~classified as 
live if the condition is satisfied, wfa eTein~th^ plurality of^\ 
cgmponents'compri^'classgrrocthodsr and fi elds; 

determining whether a given class is live when the given 65 
class may be instantiated by refection during any 
execution of the program; 
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determining whether a given method is live when the 
given method may be called by refection during any 
execution of the program; 

determining whether a given field is live when the given 
field may be accessed by refection during any execu- 
tion of the program; and 

generating data identifying components that are live based 
upon the conditional directive. 

2. The method of claim 1, wherein the generating step 
comprises the steps of: 

(a) constructing an initial call graph of the program; 

(b) extending the call graph using information about 
components that have been classified as live; 

(c) extending the call graph using the conditional direc- 
tive; and 

(d) identifying in the program any component that is 
classified as live; 

where steps (b), (c), and (d) are preferably performed 
repeatedly until no further components are added to the 
call graph. 

3. The method of claim 2, wherein the condition associ- 
ated with the first set of components is based upon useful- 
ness of the first set of components during any execution of 
the program. 

4. The method of claim 3, wherein the condition associ- 
ated with the first set of components is satisfied if at least one 
component in the first set of components is used during any 
execution of the program. 

5. The method of claim 1, further comprising the step of 
generating a representation of the program based upon the 
data. 

6. The method of claim 5, wherein the representation of 
the program omits components that are determined not to be 
live, 

7. The method of claim 1, further comprising the step of 
identifying at least one component of the program that is 
live, and reporting the at least one component to a user via 
a graphical user interface. 

8. The method of claim 1, wherein the first set of 
components include components that are part of a library 
used by the program. 

9. The method of claim 1, wherein the data including at 
least one conditional directive is derived from at least two 
distinct files, wherein one of the distinct files is associated 
with program, and wherein another of the distinct files is 
associated with the library. 

10. The method of claim 1, wherein one of the distinct 
files includes an import directive that provides a mechanism 
to import directives from another file. 

11. A program storage device readable by a machine, 
tangibly embodying a series of instructions executable by 
the machine to perform method steps for analyzing an 
object-oriented program implementing reflection compris- 
ing a plurality of components, the method steps comprising: 

providing data including at least one conditional directive, 
wherein the conditional directive specifies i) a condi- 
tion associated with a first set of components, and ii) a 
second set of components that are to be classified as 
live if the condition is satisfied, wherein the plurality of 
components comprise classes, methods, and fields; 

determining whether a given class is live when the given 
class may be instantiated by refection during any 
execution of the program; 

determining whether a given method is live when the 
given method may be called by refection during any 
execution of the program; 
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determining whether a given field is live when the given 
field may be accessed by refection during any execu- 
tion of the program; and 

generating data identifying components that are live based 
upon the conditional directive. 

12. The program storage device of claim 11, wherein the 
generating step comprises the steps of: 

(a) constructing an initial call graph of the program; 

(b) extending the call graph using information about 
components that have been classified as live; 

(c) extending the call graph using the conditional direc- 
tive; and 

(d) identifying in the program any component that is 
classified as live; 

where steps (b), (c), and (d) are preferably performed 
repeatedly until no further components are added to the 
call graph. 

13. The program storage device of claim 12, wherein the 
condition associated with the first set of components is based 
upon usefulness of the first set of components during any 
execution of the program. 

14. The program storage device of claim 13, wherein the 
condition associated with the first set of components is 
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satisfied if at least one component in the first set of com- 
ponents is used during any execution of the program. 

15. The program storage device of claim 11, further 
comprising the step of generating a representation of the 

s program based upon the data. 

16. The program storage device of claim 15, wherein the 
representation of the program omits components that are 
determined not to be live. 

17. The program storage device of claim 11, further 
comprising the step of identifying at least one component of 
the program that is live, and reporting the at least one 
component to a user via a graphical user interface. 

18. The program storage device of claim 11, wherein the 
first set of components include components that are part of 
a library used by the program. 

15 19. The program storage device of claim 11, wherein the 
data including at least one conditional directive is derived 
from at least two distinct files, wherein one of the distinct 
files is associated with program, and wherein another of the 
distinct files is associated with the library. 

20 20. The program storage device of claim U, wherein one 
of the distinct files includes an import directive that provides 
a mechanism to import directives from another file. 

* * * * * 
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